Pesquisa | Portal Regional da BVS

1.

Comparative genomics of macaques and integrated insights into genetic variation and population history.

Zhang, Shilong; Xu, Ning; Fu, Lianting; Yang, Xiangyu; Li, Yamei; Yang, Zikun; Feng, Yu; Ma, Kaiyue; Jiang, Xinrui; Han, Junmin; Hu, Ruixing; Zhang, Lu; de Gennaro, Luciana; Ryabov, Fedor; Meng, Dan; He, Yaoxi; Wu, Dongya; Yang, Chentao; Paparella, Annalisa; Mao, Yuxiang; Bian, Xinyan; Lu, Yong; Antonacci, Francesca; Ventura, Mario; Shepelev, Valery A; Miga, Karen H; Alexandrov, Ivan A; Logsdon, Glennis A; Phillippy, Adam M; Su, Bing; Zhang, Guojie; Eichler, Evan E; Lu, Qing; Shi, Yongyong; Sun, Qiang; Mao, Yafei.

bioRxiv ; 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-38645259

RESUMO

The crab-eating macaques ( Macaca fascicularis ) and rhesus macaques ( M. mulatta ) are widely studied nonhuman primates in biomedical and evolutionary research. Despite their significance, the current understanding of the complex genomic structure in macaques and the differences between species requires substantial improvement. Here, we present a complete genome assembly of a crab-eating macaque and 20 haplotype-resolved macaque assemblies to investigate the complex regions and major genomic differences between species. Segmental duplication in macaques is â¼42% lower, while centromeres are â¼3.7 times longer than those in humans. The characterization of â¼2 Mbp fixed genetic variants and â¼240 Mbp complex loci highlights potential associations with metabolic differences between the two macaque species (e.g., CYP2C76 and EHBP1L1 ). Additionally, hundreds of alternative splicing differences show post-transcriptional regulation divergence between these two species (e.g., PNPO ). We also characterize 91 large-scale genomic differences between macaques and humans at a single-base-pair resolution and highlight their impact on gene regulation in primate evolution (e.g., FOLH1 and PIEZO2 ). Finally, population genetics recapitulates macaque speciation and selective sweeps, highlighting potential genetic basis of reproduction and tail phenotype differences (e.g., STAB1 , SEMA3F , and HOXD13 ). In summary, the integrated analysis of genetic variation and population genetics in macaques greatly enhances our comprehension of lineage-specific phenotypes, adaptation, and primate evolution, thereby improving their biomedical applications in human diseases.

2.

The variation and evolution of complete human centromeres.

Logsdon, Glennis A; Rozanski, Allison N; Ryabov, Fedor; Potapova, Tamara; Shepelev, Valery A; Catacchio, Claudia R; Porubsky, David; Mao, Yafei; Yoo, DongAhn; Rautiainen, Mikko; Koren, Sergey; Nurk, Sergey; Lucas, Julian K; Hoekzema, Kendra; Munson, Katherine M; Gerton, Jennifer L; Phillippy, Adam M; Ventura, Mario; Alexandrov, Ivan A; Eichler, Evan E.

Nature ; 2024 Apr 03.

Artigo em Inglês | MEDLINE | ID: mdl-38570684

RESUMO

Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

3.

Structurally divergent and recurrently mutated regions of primate genomes.

Mao, Yafei; Harvey, William T; Porubsky, David; Munson, Katherine M; Hoekzema, Kendra; Lewis, Alexandra P; Audano, Peter A; Rozanski, Allison; Yang, Xiangyu; Zhang, Shilong; Yoo, DongAhn; Gordon, David S; Fair, Tyler; Wei, Xiaoxi; Logsdon, Glennis A; Haukness, Marina; Dishuck, Philip C; Jeong, Hyeonsoo; Del Rosario, Ricardo; Bauer, Vanessa L; Fattor, Will T; Wilkerson, Gregory K; Mao, Yuxiang; Shi, Yongyong; Sun, Qiang; Lu, Qing; Paten, Benedict; Bakken, Trygve E; Pollen, Alex A; Feng, Guoping; Sawyer, Sara L; Warren, Wesley C; Carbone, Lucia; Eichler, Evan E.

Cell ; 187(6): 1547-1562.e13, 2024 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-38428424

RESUMO

We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or â¼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.

Assuntos

Genoma , Primatas , Animais , Humanos , Sequência de Bases , Primatas/classificação , Primatas/genética , Evolução Biológica , Análise de Sequência de DNA , Variação Estrutural do Genoma

4.

Efficient formation of single-copy human artificial chromosomes.

Gambogi, Craig W; Birchak, Gabriel J; Mer, Elie; Brown, David M; Yankson, George; Kixmoeller, Kathryn; Gavade, Janardan N; Espinoza, Josh L; Kashyap, Prakriti; Dupont, Chris L; Logsdon, Glennis A; Heun, Patrick; Glass, John I; Black, Ben E.

Science ; 383(6689): 1344-1349, 2024 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-38513017

RESUMO

Large DNA assembly methodologies underlie milestone achievements in synthetic prokaryotic and budding yeast chromosomes. While budding yeast control chromosome inheritance through ~125-base pair DNA sequence-defined centromeres, mammals and many other eukaryotes use large, epigenetic centromeres. Harnessing centromere epigenetics permits human artificial chromosome (HAC) formation but is not sufficient to avoid rampant multimerization of the initial DNA molecule upon introduction to cells. We describe an approach that efficiently forms single-copy HACs. It employs a ~750-kilobase construct that is sufficiently large to house the distinct chromatin types present at the inner and outer centromere, obviating the need to multimerize. Delivery to mammalian cells is streamlined by employing yeast spheroplast fusion. These developments permit faithful chromosome engineering in the context of metazoan cells.

Assuntos

Centrômero , Cromossomos Artificiais Humanos , Epigênese Genética , Humanos , Centrômero/genética , Centrômero/metabolismo , Cromatina/metabolismo , Cromossomos Artificiais Humanos/genética , Cromossomos Artificiais Humanos/metabolismo , Saccharomycetales/genética

5.

Complete chromosome 21 centromere sequences from a Down syndrome family reveal size asymmetry and differences in kinetochore attachment.

Mastrorosa, F Kumara; Rozanski, Allison N; Harvey, William T; Knuth, Jordan; Garcia, Gage; Munson, Katherine M; Hoekzema, Kendra; Logsdon, Glennis A; Eichler, Evan E.

bioRxiv ; 2024 Feb 26.

Artigo em Inglês | MEDLINE | ID: mdl-38464314

RESUMO

Down syndrome is the most common form of human intellectual disability caused by precocious segregation and nondisjunction of chromosome 21. Differences in centromere structure have been hypothesized to play a potential role in this process in addition to the well-established risk of advancing maternal age. Using long-read sequencing, we completely sequenced and assembled the centromeres from a parent-child trio where Trisomy 21 arose in the child as a result of a meiosis I error. The proband carries three distinct chromosome 21 centromere haplotypes that vary by 11-fold in length--both the largest (H1) and smallest (H2) originating from the mother. The longest H1 allele harbors a less clearly defined centromere dip region (CDR) as defined by CpG methylation and a significantly reduced signal by CENP-A chromatin immunoprecipitation sequencing when compared to H2 or paternal H3 centromeres. These epigenetic signatures suggest less competent kinetochore attachment for the maternally transmitted H1. Analysis of H1 in the mother indicates that the reduced CENP-A ChIP-seq signal, but not the CDR profile, pre-existed the meiotic nondisjunction event. A comparison of the three proband centromeres to a population sampling of 35 completely sequenced chromosome 21 centromeres shows that H2 is the smallest centromere sequenced to date and all three haplotypes (H1-H3) share a common origin of ~15 thousand years ago. These results suggest that recent asymmetry in size and epigenetic differences of chromosome 21 centromeres may contribute to nondisjunction risk.

6.

Centromere innovations within a mouse species.

Gambogi, Craig W; Pandey, Nootan; Dawicki-McKenna, Jennine M; Arora, Uma P; Liskovykh, Mikhail A; Ma, Jun; Lamelza, Piero; Larionov, Vladimir; Lampson, Michael A; Logsdon, Glennis A; Dumont, Beth L; Black, Ben E.

Sci Adv ; 9(46): eadi5764, 2023 11 17.

Artigo em Inglês | MEDLINE | ID: mdl-37967185

RESUMO

Mammalian centromeres direct faithful genetic inheritance and are typically characterized by regions of highly repetitive and rapidly evolving DNA. We focused on a mouse species, Mus pahari, that we found has evolved to house centromere-specifying centromere protein-A (CENP-A) nucleosomes at the nexus of a satellite repeat that we identified and termed π-satellite (π-sat), a small number of recruitment sites for CENP-B, and short stretches of perfect telomere repeats. One M. pahari chromosome, however, houses a radically divergent centromere harboring ~6 mega-base pairs of a homogenized π-sat-related repeat, π-satB, that contains >20,000 functional CENP-B boxes. There, CENP-B abundance promotes accumulation of microtubule-binding components of the kinetochore and a microtubule-destabilizing kinesin of the inner centromere. We propose that the balance of pro- and anti-microtubule binding by the new centromere is what permits it to segregate during cell division with high fidelity alongside the older ones whose sequence creates a markedly different molecular composition.

Assuntos

Autoantígenos , Proteínas Cromossômicas não Histona , Camundongos , Animais , Proteínas Cromossômicas não Histona/genética , Proteínas Cromossômicas não Histona/metabolismo , Centrômero/genética , Centrômero/metabolismo , Proteína Centromérica A/genética , Nucleossomos , Mamíferos/genética

7.

Efficient Formation of Single-copy Human Artificial Chromosomes.

Gambogi, Craig W; Mer, Elie; Brown, David M; Yankson, George; Gavade, Janardan N; Logsdon, Glennis A; Heun, Patrick; Glass, John I; Black, Ben E.

bioRxiv ; 2023 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-37546784

RESUMO

Large DNA assembly methodologies underlie milestone achievements in synthetic prokaryotic and budding yeast chromosomes. While budding yeast control chromosome inheritance through ~125 bp DNA sequence-defined centromeres, mammals and many other eukaryotes use large, epigenetic centromeres. Harnessing centromere epigenetics permits human artificial chromosome (HAC) formation but is not sufficient to avoid rampant multimerization of the initial DNA molecule upon introduction to cells. Here, we describe an approach that efficiently forms single-copy HACs. It employs a ~750 kb construct that is sufficiently large to house the distinct chromatin types present at the inner and outer centromere, obviating the need to multimerize. Delivery to mammalian cells is streamlined by employing yeast spheroplast fusion. These developments permit faithful chromosome engineering in the context of metazoan cells.

8.

Assembly of 43 human Y chromosomes reveals extensive complexity and variation.

Hallast, Pille; Ebert, Peter; Loftus, Mark; Yilmaz, Feyza; Audano, Peter A; Logsdon, Glennis A; Bonder, Marc Jan; Zhou, Weichen; Höps, Wolfram; Kim, Kwondo; Li, Chong; Hoyt, Savannah J; Dishuck, Philip C; Porubsky, David; Tsetsos, Fotios; Kwon, Jee Young; Zhu, Qihui; Munson, Katherine M; Hasenfeld, Patrick; Harvey, William T; Lewis, Alexandra P; Kordosky, Jennifer; Hoekzema, Kendra; O'Neill, Rachel J; Korbel, Jan O; Tyler-Smith, Chris; Eichler, Evan E; Shi, Xinghua; Beck, Christine R; Marschall, Tobias; Konkel, Miriam K; Lee, Charles.

Nature ; 621(7978): 355-364, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37612510

RESUMO

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.

Assuntos

Cromossomos Humanos Y , Evolução Molecular , Humanos , Masculino , Cromossomos Humanos Y/genética , Genoma Humano/genética , Genômica , Taxa de Mutação , Fenótipo , Eucromatina/genética , Pseudogenes , Variação Genética/genética , Cromossomos Humanos X/genética , Regiões Pseudoautossômicas/genética

9.

The complete sequence of a human Y chromosome.

Rhie, Arang; Nurk, Sergey; Cechova, Monika; Hoyt, Savannah J; Taylor, Dylan J; Altemose, Nicolas; Hook, Paul W; Koren, Sergey; Rautiainen, Mikko; Alexandrov, Ivan A; Allen, Jamie; Asri, Mobin; Bzikadze, Andrey V; Chen, Nae-Chyun; Chin, Chen-Shan; Diekhans, Mark; Flicek, Paul; Formenti, Giulio; Fungtammasan, Arkarachai; Garcia Giron, Carlos; Garrison, Erik; Gershman, Ariel; Gerton, Jennifer L; Grady, Patrick G S; Guarracino, Andrea; Haggerty, Leanne; Halabian, Reza; Hansen, Nancy F; Harris, Robert; Hartley, Gabrielle A; Harvey, William T; Haukness, Marina; Heinz, Jakob; Hourlier, Thibaut; Hubley, Robert M; Hunt, Sarah E; Hwang, Stephen; Jain, Miten; Kesharwani, Rupesh K; Lewis, Alexandra P; Li, Heng; Logsdon, Glennis A; Lucas, Julian K; Makalowski, Wojciech; Markovic, Christopher; Martin, Fergal J; Mc Cartney, Ann M; McCoy, Rajiv C; McDaniel, Jennifer; McNulty, Brandy M.

Nature ; 621(7978): 344-354, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37612512

RESUMO

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Assuntos

Cromossomos Humanos Y , Genômica , Análise de Sequência de DNA , Humanos , Sequência de Bases , Cromossomos Humanos Y/genética , DNA Satélite/genética , Variação Genética/genética , Genética Populacional , Genômica/métodos , Genômica/normas , Heterocromatina/genética , Família Multigênica/genética , Padrões de Referência , Duplicações Segmentares Genômicas/genética , Análise de Sequência de DNA/normas , Sequências de Repetição em Tandem/genética , Telômero/genética

10.

The variation and evolution of complete human centromeres.

Logsdon, Glennis A; Rozanski, Allison N; Ryabov, Fedor; Potapova, Tamara; Shepelev, Valery A; Mao, Yafei; Rautiainen, Mikko; Koren, Sergey; Nurk, Sergey; Porubsky, David; Lucas, Julian K; Hoekzema, Kendra; Munson, Katherine M; Gerton, Jennifer L; Phillippy, Adam M; Alexandrov, Ivan A; Eichler, Evan E.

bioRxiv ; 2023 May 30.

Artigo em Inglês | MEDLINE | ID: mdl-37398417

RESUMO

We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.8% of centromeric sequence, on average, cannot be reliably aligned with current methods due to the emergence of new α-satellite higher-order repeat (HOR) structures and two to threefold differences in the length of the centromeres. The extent to which this occurs differs depending on the chromosome and haplotype. Comparing the two sets of complete human centromeres, we find that eight harbor distinctly different α-satellite HOR array structures and four contain novel α-satellite HOR variants in high abundance. DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by at least 500 kbp-a property not readily associated with novel α-satellite HORs. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan, and macaque genomes. Comparative analyses reveal nearly complete turnover of α-satellite HORs, but with idiosyncratic changes in structure characteristic to each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the p- and q-arms of human chromosomes and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

11.

Characterization of large-scale genomic differences in the first complete human genome.

Yang, Xiangyu; Wang, Xuankai; Zou, Yawen; Zhang, Shilong; Xia, Manying; Fu, Lianting; Vollger, Mitchell R; Chen, Nae-Chyun; Taylor, Dylan J; Harvey, William T; Logsdon, Glennis A; Meng, Dan; Shi, Junfeng; McCoy, Rajiv C; Schatz, Michael C; Li, Weidong; Eichler, Evan E; Lu, Qing; Mao, Yafei.

Genome Biol ; 24(1): 157, 2023 07 04.

Artigo em Inglês | MEDLINE | ID: mdl-37403156

RESUMO

BACKGROUND: The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. RESULTS: Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution. CONCLUSION: Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.

Assuntos

Genoma Humano , Genômica , Animais , Humanos , Duplicações Segmentares Genômicas , Família Multigênica , Centrômero/genética , Subfamília C de Receptores Semelhantes a Lectina de Células NK/genética

12.

Centromere Innovations Within a Mouse Species.

Gambogi, Craig W; Pandey, Nootan; Dawicki-McKenna, Jennine M; Arora, Uma P; Liskovykh, Mikhail A; Ma, Jun; Lamelza, Piero; Larionov, Vladimir; Lampson, Michael A; Logsdon, Glennis A; Dumont, Beth L; Black, Ben E.

bioRxiv ; 2023 May 13.

Artigo em Inglês | MEDLINE | ID: mdl-37333154

RESUMO

Mammalian centromeres direct faithful genetic inheritance and are typically characterized by regions of highly repetitive and rapidly evolving DNA. We focused on a mouse species, Mus pahari, that we found has evolved to house centromere-specifying CENP-A nucleosomes at the nexus of a satellite repeat that we identified and term π-satellite (π-sat), a small number of recruitment sites for CENP-B, and short stretches of perfect telomere repeats. One M. pahari chromosome, however, houses a radically divergent centromere harboring ~6 Mbp of a homogenized π-sat-related repeat, π-satB, that contains >20,000 functional CENP-B boxes. There, CENP-B abundance drives accumulation of microtubule-binding components of the kinetochore, as well as a microtubule-destabilizing kinesin of the inner centromere. The balance of pro- and anti-microtubule-binding by the new centromere permits it to segregate during cell division with high fidelity alongside the older ones whose sequence creates a markedly different molecular composition.

13.

Increased mutation and gene conversion within human segmental duplications.

Vollger, Mitchell R; Dishuck, Philip C; Harvey, William T; DeWitt, William S; Guitart, Xavi; Goldberg, Michael E; Rozanski, Allison N; Lucas, Julian; Asri, Mobin; Munson, Katherine M; Lewis, Alexandra P; Hoekzema, Kendra; Logsdon, Glennis A; Porubsky, David; Paten, Benedict; Harris, Kelley; Hsieh, PingHsun; Eichler, Evan E.

Nature ; 617(7960): 325-334, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-37165237

RESUMO

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.

Assuntos

Conversão Gênica , Mutação , Duplicações Segmentares Genômicas , Humanos , Conversão Gênica/genética , Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Haplótipos/genética , Éxons/genética , Citosina/química , Guanina/química , Ilhas de CpG/genética

14.

Structurally divergent and recurrently mutated regions of primate genomes.

Mao, Yafei; Harvey, William T; Porubsky, David; Munson, Katherine M; Hoekzema, Kendra; Lewis, Alexandra P; Audano, Peter A; Rozanski, Allison; Yang, Xiangyu; Zhang, Shilong; Gordon, David S; Wei, Xiaoxi; Logsdon, Glennis A; Haukness, Marina; Dishuck, Philip C; Jeong, Hyeonsoo; Del Rosario, Ricardo; Bauer, Vanessa L; Fattor, Will T; Wilkerson, Gregory K; Lu, Qing; Paten, Benedict; Feng, Guoping; Sawyer, Sara L; Warren, Wesley C; Carbone, Lucia; Eichler, Evan E.

bioRxiv ; 2023 Mar 07.

Artigo em Inglês | MEDLINE | ID: mdl-36945442

RESUMO

To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.47 Mbp or ~27% of the genome has been affected by SVs based on analysis of these primate lineages. We identify 1,607 structurally divergent regions (SDRs) wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (CARDs, ABCD7, OLAH) and new lineage-specific genes are generated (e.g., CKAP2, NEK5) and have become targets of rapid chromosomal diversification and positive selection (e.g., RGPDs). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species for the first time.

15.

Telomere-to-telomere assembly of diploid chromosomes with Verkko.

Rautiainen, Mikko; Nurk, Sergey; Walenz, Brian P; Logsdon, Glennis A; Porubsky, David; Rhie, Arang; Eichler, Evan E; Phillippy, Adam M; Koren, Sergey.

Nat Biotechnol ; 41(10): 1474-1482, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-36797493

RESUMO

The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.

Assuntos

Diploide , Genômica , Humanos , Análise de Sequência de DNA/métodos , Genômica/métodos , Genoma Humano/genética , Telômero/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos

16.

GAVISUNK: genome assembly validation via inter-SUNK distances in Oxford Nanopore reads.

Dishuck, Philip C; Rozanski, Allison N; Logsdon, Glennis A; Porubsky, David; Eichler, Evan E.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36321867

RESUMO

MOTIVATION: Highly contiguous de novo phased diploid genome assemblies are now feasible for large numbers of species and individuals. Methods are needed to validate assembly accuracy and detect misassemblies with orthologous sequencing data to allow for confident downstream analyses. RESULTS: We developed GAVISUNK, an open-source pipeline that detects misassemblies and produces a set of reliable regions genome-wide by assessing concordance of distances between unique k-mers in Pacific Biosciences high-fidelity assemblies and raw Oxford Nanopore Technologies reads. AVAILABILITY AND IMPLEMENTATION: GAVISUNK is available at https://github.com/pdishuck/GAVISUNK. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Nanoporos , Software , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma

17.

Semi-automated assembly of high-quality diploid human reference genomes.

Jarvis, Erich D; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A; Carnevali, Paolo; Chaisson, Mark J P; Chin, Chen-Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly; Fedrigo, Olivier; Fulton, Robert S; Fulton, Lucinda L; Garg, Shilpa; Gerton, Jennifer L; Ghurye, Jay; Granat, Anastasiya; Green, Richard E; Harvey, William; Hasenfeld, Patrick; Hastie, Alex; Haukness, Marina; Jaeger, Erich B; Jain, Miten; Kirsche, Melanie; Kolmogorov, Mikhail; Korbel, Jan O; Koren, Sergey; Korlach, Jonas; Lee, Joyce; Li, Daofeng; Lindsay, Tina; Lucas, Julian; Luo, Feng; Marschall, Tobias; Mitchell, Matthew W; McDaniel, Jennifer; Nie, Fan; Olsen, Hugh E; Olson, Nathan D.

Nature ; 611(7936): 519-531, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36261518

RESUMO

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

Assuntos

Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genética

18.

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

Mc Cartney, Ann M; Shafin, Kishwar; Alonge, Michael; Bzikadze, Andrey V; Formenti, Giulio; Fungtammasan, Arkarachai; Howe, Kerstin; Jain, Chirag; Koren, Sergey; Logsdon, Glennis A; Miga, Karen H; Mikheenko, Alla; Paten, Benedict; Shumate, Alaina; Soto, Daniela C; Sovic, Ivan; Wood, Jonathan M D; Zook, Justin M; Phillippy, Adam M; Rhie, Arang.

Nat Methods ; 19(6): 687-695, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35361931

RESUMO

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Feminino , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Gravidez , Análise de Sequência de DNA/métodos , Telômero/genética

19.

Complete genomic and epigenetic maps of human centromeres.

Altemose, Nicolas; Logsdon, Glennis A; Bzikadze, Andrey V; Sidhwani, Pragya; Langley, Sasha A; Caldas, Gina V; Hoyt, Savannah J; Uralsky, Lev; Ryabov, Fedor D; Shew, Colin J; Sauria, Michael E G; Borchers, Matthew; Gershman, Ariel; Mikheenko, Alla; Shepelev, Valery A; Dvorkina, Tatiana; Kunyavskaya, Olga; Vollger, Mitchell R; Rhie, Arang; McCartney, Ann M; Asri, Mobin; Lorig-Roach, Ryan; Shafin, Kishwar; Lucas, Julian K; Aganezov, Sergey; Olson, Daniel; de Lima, Leonardo Gomes; Potapova, Tamara; Hartley, Gabrielle A; Haukness, Marina; Kerpedjiev, Peter; Gusev, Fedor; Tigyi, Kristof; Brooks, Shelise; Young, Alice; Nurk, Sergey; Koren, Sergey; Salama, Sofie R; Paten, Benedict; Rogaev, Evgeny I; Streets, Aaron; Karpen, Gary H; Dernburg, Abby F; Sullivan, Beth A; Straight, Aaron F; Wheeler, Travis J; Gerton, Jennifer L; Eichler, Evan E; Phillippy, Adam M; Timp, Winston.

Science ; 376(6588): eabl4178, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35357911

RESUMO

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

Assuntos

Centrômero/genética , Mapeamento Cromossômico , Epigênese Genética , Genoma Humano , Evolução Molecular , Genômica , Humanos , Sequências Repetitivas de Ácido Nucleico

20.

Epigenetic patterns in a complete human genome.

Gershman, Ariel; Sauria, Michael E G; Guitart, Xavi; Vollger, Mitchell R; Hook, Paul W; Hoyt, Savannah J; Jain, Miten; Shumate, Alaina; Razaghi, Roham; Koren, Sergey; Altemose, Nicolas; Caldas, Gina V; Logsdon, Glennis A; Rhie, Arang; Eichler, Evan E; Schatz, Michael C; O'Neill, Rachel J; Phillippy, Adam M; Miga, Karen H; Timp, Winston.

Science ; 376(6588): eabj5089, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35357915

RESUMO

The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.

Assuntos

Ilhas de CpG , Metilação de DNA , Epigênese Genética , Genoma Humano , Centrômero/genética , Centrômero/metabolismo , Doença/genética , Loci Gênicos , Genômica/normas , Humanos , Padrões de Referência , Análise de Sequência de DNA

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA